Git and GitHub Tutorial
Overview of Git and GitHub
At a high-level, what are git and GitHub?
- git: a version control system that allows you to track changes in your code
- GitHub: a platform that allows you to host your git repositories online/remotely
There many possible starting points for creating/initializing a GitHub repository:
- Start with an existing remote repository from GitHub;
- Create a new remote repository on GitHub; or
- Start with an existing local repository on your computer.
In this walkthrough, we will be setting up two GitHub repositories:
dsip-s26: repository with course materials (lectures, code, etc.)- To set up this
dsip-s26repository, we will use option (A) above. - You won’t be interacting with this repository much besides pulling to receive course materials.
- To set up this
dsip: your repository for your own work (e.g., labs, final project)- To set up this
dsiprepository, we will use option (B) above. - This is the repository that you will be interacting with the most.
- To set up this
Instructions to set up the dsip-s26 repository
In your terminal:
- Navigate to the directory where you want to store the course materials, e.g.,
cd path/to/directory- Clone the
dsip-s26repository by running the following command:
git clone https://github.com/tiffanymtang/dsip-s26.gitNote: This will create a new directory called
dsip-s26in your current working directory. To see this, you can runls
- To update the course materials at any point during the semester, you should navigate into the
dsip-s26directory, e.g.,
cd dsip-s26and run
git pullOpen GitKraken and click on the “Clone a repo” button.
In the URL field, enter the following URL: https://github.com/tiffanymtang/dsip-s26. You can select where you want to store this repository on your computer by clicking on the “Browse” button next to “Where to clone to”. Once you are satisfied with the location, click on the “Clone the repo!” button.
If a pop-up appears asking you whether to open the
dsip-s26repository, go ahead and click on the “Open Now” button.To update the course materials at any point during the semester, click on the “Pull” button at the top of the application.
Instructions to set up your dsip repository
Next, we will create your personal dsip repository that you will be using to work on your labs. Unlike the dsip-s26 repository which was already an existing GitHub repository (and thus you only had to clone it locally), you will be creating your dsip repository from scratch on GitHub.
Go to: https://github.com/ and log in.
Click on the green “New” button (on the left) to create a new repository.
Fill in the following information:
- Owner: your GitHub username
- Repository name:
dsip - Public or Private: Please choose “Private” so that only you (and your added collaborators) can see your repository.
- Initialize this repository with: I would recommend checking the box for “Add a README file” so that you can easily clone the repository to your computer.
- Add .gitignore: For now, you can leave this as “None”.
- Add a license: I would recommend selecting “MIT License” from the dropdown menu, but this is optional.
Click on the green “Create repository” button.
Once you have created the repository, you will be taken to the repository’s main page. We next need to “clone” the (remote) repository to our local computers like we did with the
dsip-s26repository. So following the same steps from before:
In your terminal:
- Navigate to the directory where you want to store your
dsiprepository, e.g.,
cd path/to/directory- Clone the
dsiprepository by running the following command:
git clone https://github.com/{your_github_username}/dsip.gitNote: This will create a new directory called
dsipin your current working directory. To see this, you can runls
Open GitKraken and click on the “Clone a repo” button.
In the URL field, enter the following URL: https://github.com/{your_github_username}/dsip. You can select where you want to store this repository on your computer by clicking on the “Browse” button next to “Where to clone to”. Once you are satisfied with the location, click on the “Clone the repo!” button.
If a pop-up appears asking you whether to open the
dsiprepository, go ahead and click on the “Open Now” button.
So far, we’ve set up two different GitHub repositories. Next, using your dsip repository, we will go over how to interact/make changes to these repositories and to push these changes to GitHub.
A typical GitHub workflow
A typical GitHub workflow involves the following four commands:
- First,
git pullto download changes from the remote GitHub repository to your local computer - After making changes to your local repository,
git addfiles that you’d like to stage for your next commit - Next,
git committo store a “snapshot” of these added changes in your git version history - Finally,
git pushto upload these local changes to the remote GitHub repository
To see this workflow in action, let’s make a minor change to our dsip repository. In particular, let’s create a new text file called info.txt that contains the following two lines:
name = "Your Name"
github_name = "Your GitHub Username"
Please place this info.txt file in your dsip folder (i.e., the file path should be dsip/info.txt).
Let’s now go through the four steps of the GitHub workflow. We will look at the equivalent commands using terminal, GitHub Desktop, and GitKraken side-by-side.
Terminal
- Navigate to the desired repository (i.e., your
dsiprepository):
cd path/to/dsipGitKraken
Navigate to the desired repository (i.e., your
dsiprepository):Open your
dsiprepository in GitKraken (e.g., using the “Browse for a repo” button).
- To pull:
git pullRecall: “pulling” is the process of downloading changes from the remote GitHub repository to your local computer.
- To add modified/new files to staging area:
git add info.txtYou may want to check the status of your git repository using
git statusto see which files have been modified and/or added to the staging area. It is common to rungit statusbefore and/or after each step of this workflow when first learning git.
- To commit staged files (with message/description):
git commit -m "add info.txt"To commit staged files (with message/description):
Add a commit message to the “Commit summary” field. Once you are satisfied with the message, click on the “Commit changes” button.
Tip: It is good practice to keep your commits modular and focused (e.g., they should address one bug or add one feature to your code). This will make it easier to track version changes and to revert back to previous versions if needed. To help facilitate this, you should also try to write informative commit messages that describe the changes you made in the commit.
- To push:
git pushTo push:
Click on the “Push” button at the top of the application. After you click on “Push”, the head of the local repository (computer icon) and the head of the remote repository (your GitHub icon) should be aligned at the same commit.
Recall: “pushing” is the process of uploading changes from your local computer to the remote GitHub repository. If you do not push your changes, they will not be reflected on GitHub and not accessible to collaborators.
Lastly, please add tiffanymtang and caiyufei8 as a collaborator in your dsip repository so that I and the grader can view your lab submissions. To do this, please:
- Go to your
dsiprepository on GitHub: https://github.com/{your_github_username}/dsip - Go to Settings (on the top) > Collaborators (on the left) > Add people (the green button) > Enter
tiffanymtang> Click on “Add tiffanymtang to this repository”. - Repeat the same process to add
caiyufei8as a collaborator.
.gitignore
As you begin working on your labs and final project, you will likely generate some files that you do not want to track with git (e.g., data files, temporary files, compiled files, etc.). For example, the .DS_Store file is a hidden “junk” file that is created by macOS and should not be tracked. Python also generates __pycache__ folders when compiling code, and Jupyter notebooks generate .ipynb_checkpoints folders when running notebooks. These files/folders are not necessary to track and will just clutter your repository.
We can instruct git to ignore these files by creating a .gitignore file in our repository. This file contains a list of files and directories that we want git to ignore and never track.
If you followed the R parts of this walkthrough, then a .gitignore file has already been created automatically (by renv). To find this file in your file manager, you will need to show hidden files (i.e., any files that start with .). To reveal hidden files in your file manager, you can press Ctrl+Shift+. (or Cmd+Shift+. on Mac). If a .gitignore has not yet been created, you can create one manually by opening your favorite text editor and saving an empty file with the name .gitignore.
To add the .DS_Store file to the .gitignore file, you can open the .gitignore file in your text editor and add the following line:
*.DS_Store
Note: the
*is a wildcard character that matches any sequence of characters. So*.DS_Storewill match any file that ends with.DS_Store, and thus, adding the above line to your.gitignorewill tell git to ignore all files that end in the extension.DS_Store.
Some other files/folders that you should add to your .gitignore file include:
*/data/*
*__pycache__*
*.ipynb_checkpoints*
It is generally best practice to avoid pushing large data files to GitHub repositories; hence, here we are ignoring all files in any data/ folder. Avoid uploading the datasets to GitHub for your labs!
For reference, GitHub has a file size limit of 100 MB per file. Large files close to this limit can dramatically slow down the performance of your repository. If you exceed this limit, bad things usually happen (e.g., losing lots of work, being unable to push new changes, etc).
After these changes, your .gitignore file should look something like this:
Please save these changes to your .gitignore file. After saving these changes, you can check the status of your repository again to see that many of the files that you previously saw (e.g., .DS_Store, the data files, …) are no longer being tracked by git.
Take one last moment to review all of the files remaining in your git status (or GitHub Desktop/GitKraken status view) are files that you’d like to commit and push to your GitHub repository. If you are satisfied with the files that you see, you can now proceed through the usual GitHub workflow of pulling, adding, committing, and pushing your changes to your GitHub repository.
Git Cheat Sheet
For a quick reference guide to common git/GitHub commands, please refer to this GitHub Cheat Sheet.